In this section we compare the most common terms used by the reviewers in Tripadvisor and in Booking. We preprocessed all the words by applying the Snowball library (http://snowballstem.org/), to reduce each word to its base form.
In this graph we show the differences between the two datasets in the most common words, for four languages.
The X axis shows the number of most common words considered, the Y axis shows the corresponding number of different most common words.
The difference seems to progress linearly for the first 1000 most common words and it is roughly 25% of difference in the most common words.
In this graph we show the differences in the two datasets, considering different cities around the world. For this analysis we used only reviews in English language.
The X axis shows the number of most common words considered, the Y axis shows the corresponding number of different most common words.
The graph shows that the differences in the most common words are similar for the two Italian cities, and it is smaller than the other two cities.
The following wordclouds show the 10 most common words for the two datasets for the city of Lucca.
Tripadvisor | Booking |
---|---|
The following table shows for Lucca, Paris and New York, the union of the 10 most common terms, and their ranking in the other cities.
The table highlights that the most frequent words overlap considerably among cities. In fact, 11 words are present within the first 15 positions in all the sets.